EusBila, a search service designed for the agglutinative nature of Basque
نویسندگان
چکیده
The performance of major search engines for Basque is far from satisfactory, partly due to the agglutinative nature of the language –it is commonly known that search engines do not perform well with such languages– and partly because it is not a language to which search engines restrict their results. In this paper we present EusBila, a search service for Basque that relies on the APIs of search engines, yet obtains a lemma-based and language-filtered search by means of morphological query expansion and language-filtering words. It is a cost-effective approach, which we think can be used for other agglutinative or minority languages. We also evaluate how well EusBila performs when carrying out a Basque query, and we compare this performance to that of a major search engine in terms of precision and recall, thus demonstrating that EusBila is a very valid solution.
منابع مشابه
LSA learner sentence comprehension in agglutinative and non-agglutinative languages
This work has been carried out in the context of automatic evaluation of learner summaries where text comprehension is gained using Latent Semantic Analysis (LSA) and Natural Language Processing (NLP) techniques. We had intuitively observed that lemmatized versions of LSA matrixes resembled better human Basque similarity judgement than the non lemmatized ones. This research was conducted to tes...
متن کاملBuilding the Gold Standard for the Surface Syntax of Basque
In this paper, we present the process in the construction of SF-EPEC, a 300,000-word corpus syntactically annotated that aims to be a Gold Standard for the surface syntactic processing of Basque. First, the tagset designed for this purpose is described; being Basque an agglutinative language, sometimes complex syntactic tags were needed. We also account for the different phases in the construct...
متن کاملUsing Finite State Technology in Natural Language Processing of Basque
This paper describes the components used in the design and implementation of NLP tools for Basque. These components are based on finite state technology and are devoted to the morphological analysis of Basque, an agglutinative pre-Indo-European language. We think that our design can be interesting for the treatment of other languages. The main components developed are a general and robust morph...
متن کاملCoreference Resolution for Morphologically Rich Languages. Adaptation of the Stanford System to Basque
This paper presents the adaptation of the Stanford coreference resolution system to Basque, an agglutinative head-final pro-drop language. The adapted system has been integrated into a global linguistic analysis pipeline so that the input of the system are original Basque raw texts linguistically processed, and annotated. We demonstrate that language-specific characteristics have a noteworthy e...
متن کاملCombining Stochastic and Rule-Based Methods for Disambiguation in Agglutinative Languages
In this paper we present the results of the combination of stochastic and rule-based disambiguation methods applied to Basque languagel. The methods we have used in disambiguation are Constraint Grammar formalism and an HMM based tagger developed within the MULTEXT project. As Basque is an agglutinative language, a morphological analyser is needed to attach all possible readings to each word. T...
متن کامل